1 Introduction

The Platypus family of packages are meant to provide potential pipelines and examples relevant to the broad field of computational immunology. The core set of functions can be found at https://github.com/alexyermanos/Platypus and examples of use can be found in the publications https://doi.org/10.1093/nargab/lqab023 and insert biorxiv database manuscript here

Stay tuned for updates https://twitter.com/AlexYermanos

To allow for thorough annotation of immune receptor sequence data, the Platypus ecosystem includes functions for database loading, processing, and annotation of both TCRs and BCRs. The currently supported databases are VDJdb, McPAS-TCR, and TBAdb.

2. Loading the VGM

The output of the VDJ_GEX_matrix is the main object for all downstream functions in Platypus. We can create this directly into the R session by using the public data available on PlatypusDB. We will use the VDJ data of CD8+ T cells from several murine models of acute, chronic, and latent viral infections.

library(Platypus)

#source('VDJ_antigen_integrate.R')
#source('VDJ_db_load.R')
#source('VDJ_db_annotate.R')

PlatypusDB_fetch(PlatypusDB.links = c("kuhn2021a//VDJmatrix"), 
                                     load.to.enviroment = T, combine.objects = F)
## 2022-08-07 19:40:02: Starting download of kuhn2021a__VDJmatrix.RData...
## [1] "kuhn2021a__VDJmatrix"
VDJ <- kuhn2021a__VDJmatrix[[1]]

3. VDJ_db_load for loading the specified databases

The currently supported databases include VDJdb (https://vdjdb.cdr3.net), McPAS-TCR (http://friedmanlab.weizmann.ac.il/McPAS-TCR/), and the PIRD TBAdb (https://doi.org/10.1093/bioinformatics/btz614). We will showcase how to load, process, and filter VDJdb using the VDJ_db_load function.

Loading the necessary database can be controlled via the databases parameter: “vdjdb” for VDJdb, “mcpas” for McPAS-TCR, “tbadb-bcr” for the BCR subset of TBAdb, “tbdadb-tcr” for the TCR subset.

The database csv files will be automatically downloaded from their respective website. However, if this is not working, it is recommended to input the paths to the locally downloaded databases in the file.paths parameter.

Multiple databases can be loaded and processed concurrently by grouping them in a list (e.g., databases = list(“vdjdb”, “mcpas”)).

If preprocess is set to T, the database will be processed as follows: 1. species will be filtered as either ‘Human’ or ‘Mouse’, as specified in the species paramater; 2.rows with NA values will be removed depending on the filter.sequences argument (‘VDJ’ will only remove rows with NA VDJ sequences, ‘VJ’ for VJ sequences, and ‘VDJ.VJ’ for both); 3. remove.na implements additional strategies to remove NA values (‘all’ removes all rows with a single NA value, ‘common’ will only remove rows with NA values in the columns shared with the other databases - useful when processing a list of databases, and ‘vgm’ will remove based on the common columns in the VDJ and the respective database); 4.keep.only.common will keep the common columns across all 3 supported databases - ‘VJ_cdr3s_aa’,‘VDJ_cdr3s_aa’,‘Species’,‘Epitope’,‘Antigen species’.

If preprocess is set to F, then the raw database will be loaded/ saved as a CSV file (if output.format is set to ‘save’).

We will load the VDJdb with preprocess set to T and with the default processing arguments.

db <- VDJ_db_load(databases = 'vdjdb',
                  preprocess = T,
                  species = 'Mouse',
                  filter.sequences = 'VDJ.VJ',
                  remove.na = 'common',
                  keep.only.common = T)
names(db)
## [1] "vdjdb"
class(db)
## [1] "list"

4. VDJ_db_annotate for annotating your VDJ dataframe:

Next, we will use VDJ_db_annotate to add the epitope information from VDJdb into our VDJ. To do so, we will match the epitope annotations by CDR3s. Currently, VDJ_db_annotate supports two sequence matching methods: exact sequence matching and homology matching as determined by a homology threshold.

VDJ_db_annotate takes as input a list of databases/paths to local CSV files and the VDJ. We can select the annotation features via the database.features parameter, which should be a column name present in all of the databases in the list (e.g., ‘Epitope’).

We will select match = ‘cdr3.aa’ to match by CDR3 sequences and homology = F for exact matching

1. Exact feature matching

annotated_VDJ_exact <- VDJ_db_annotate(VDJ = VDJ,
                                 db.list = db,
                                 database.features = 'Epitope',
                                 match = 'cdr3.aa',
                                 homology = F)
## Windows system detected
unique(annotated_VDJ_exact$vdjdb_Epitope)
## [1] NA          "KAVYNFATC" "HGIRNASFI" "SSPPMFRV"

2. Homology feature matching

We can do homology matching by setting homology to TRUE and a specific Levenshtein distance threshold (via the lv.distance parameter).

annotated_VDJ_homology1 <- VDJ_db_annotate(VDJ = VDJ,
                                 db.list = db,
                                 database.features = 'Epitope',
                                 match = 'cdr3.aa',
                                 homology = T,
                                 lv.distance = 1)
## Windows system detected
unique(annotated_VDJ_homology1$vdjdb_Epitope)
## [1] "KAVYNFATC"                                                      
## [2] NA                                                               
## [3] "SSPPMFRV;HGIRNASFI;SSPPMFRV;SSPPMFRV;SSPPMFRV;SSPPMFRV;SSPPMFRV"
## [4] "HGIRNASFI"                                                      
## [5] "SSPPMFRV"                                                       
## [6] "HGIRNASFI;HGIRNASFI;HGIRNASFI;HGIRNASFI"
annotated_VDJ_homology5 <- VDJ_db_annotate(VDJ = VDJ,
                                 db.list = db,
                                 database.features = 'Epitope',
                                 match = 'cdr3.aa',
                                 homology = T,
                                 lv.distance = 5)
## Windows system detected
unique(annotated_VDJ_homology5$vdjdb_Epitope)
##  [1] "KAVYNFATC"                                                                                                                                                                                                                                                                                                                                                                                              
##  [2] NA                                                                                                                                                                                                                                                                                                                                                                                                       
##  [3] "KAVYNFATC;KAVYNFATC"                                                                                                                                                                                                                                                                                                                                                                                    
##  [4] "KAVANFATM;KAPANFATM;KAPFNFATM;KAPYNFATM;KAVYNFATM;KAPYDYAPI"                                                                                                                                                                                                                                                                                                                                            
##  [5] "SSYRRPVGI"                                                                                                                                                                                                                                                                                                                                                                                              
##  [6] "TVYGFCLL;TVYGFCLL"                                                                                                                                                                                                                                                                                                                                                                                      
##  [7] "SSPPMFRV"                                                                                                                                                                                                                                                                                                                                                                                               
##  [8] "HGIRNASFI"                                                                                                                                                                                                                                                                                                                                                                                              
##  [9] "KAVYNFATC;KAVYNFATC;KAVYNFATC"                                                                                                                                                                                                                                                                                                                                                                          
## [10] "SSLENFRAYV"                                                                                                                                                                                                                                                                                                                                                                                             
## [11] "ASNENMETM"                                                                                                                                                                                                                                                                                                                                                                                              
## [12] "LSLRNPILV"                                                                                                                                                                                                                                                                                                                                                                                              
## [13] "ASNENMETM;TVYGFCLL"                                                                                                                                                                                                                                                                                                                                                                                     
## [14] "LSLRNPILV;SSLENFRAYV"                                                                                                                                                                                                                                                                                                                                                                                   
## [15] "SSPPMFRV;HGIRNASFI"                                                                                                                                                                                                                                                                                                                                                                                     
## [16] "SSYRRPVGI;ASNENMETM;ASNENMETM;ASNENMETM"                                                                                                                                                                                                                                                                                                                                                                
## [17] "SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI"                                                                                                                                                                                                                                                                                                                                                      
## [18] "SSYRRPVGI;SSYRRPVGI;SSYRRPVGI"                                                                                                                                                                                                                                                                                                                                                                          
## [19] "SSYRRPVGI;KAVYNFATC"                                                                                                                                                                                                                                                                                                                                                                                    
## [20] "VVGAVGVGK"                                                                                                                                                                                                                                                                                                                                                                                              
## [21] "LSLRNPILV;LSLRNPILV"                                                                                                                                                                                                                                                                                                                                                                                    
## [22] "ASNENMETM;KAVYNFATC"                                                                                                                                                                                                                                                                                                                                                                                    
## [23] "SSYRRPVGI;SSYRRPVGI;SSLENFRAYV;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSLENFRAYV;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI"
## [24] "SSYRRPVGI;SSLENFRAYV;SSLENFRAYV"                                                                                                                                                                                                                                                                                                                                                                        
## [25] "SSLENFRAYV;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV"                                                                                                                                                                                                                                                                                                                                                 
## [26] "ASNENMETM;ASNENMETM"                                                                                                                                                                                                                                                                                                                                                                                    
## [27] "SSLENFRAYV;SSLENFRAYV"                                                                                                                                                                                                                                                                                                                                                                                  
## [28] "SSYRRPVGI;SSYRRPVGI"                                                                                                                                                                                                                                                                                                                                                                                    
## [29] "SSYRRPVGI;SSLENFRAYV;SSLENFRAYV;SSYRRPVGI;SSYRRPVGI"                                                                                                                                                                                                                                                                                                                                                    
## [30] "SSLENFRAYV;SSLENFRAYV;SSYRRPVGI;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV;LSLRNPILV;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV;SSLENFRAYV"                                                                                                                                                                                                               
## [31] "SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI;SSYRRPVGI"                                                                                                                                                                                                                                                                                                                                            
## [32] "TVYGFCLL"                                                                                                                                                                                                                                                                                                                                                                                               
## [33] "SQLLNAKYL"                                                                                                                                                                                                                                                                                                                                                                                              
## [34] "LSLRNPILV;SSYRRPVGI;LSLRNPILV;KAVYNFATC"                                                                                                                                                                                                                                                                                                                                                                
## [35] "SSPPMFRV;HGIRNASFI;SSPPMFRV;SSPPMFRV;SSPPMFRV;SSPPMFRV;SSPPMFRV"                                                                                                                                                                                                                                                                                                                                        
## [36] "KAVANFATM;KAPANFATM;KAPFNFATM;KAPYNFATM;KAVYNFATM;KAPYDYAPI;ASNENMETM;KAVYNFATC"                                                                                                                                                                                                                                                                                                                        
## [37] "HGIRNASFI;HGIRNASFI;HGIRNASFI;HGIRNASFI"                                                                                                                                                                                                                                                                                                                                                                
## [38] "HGIRNASFI;HGIRNASFI"

5. Visualizing epitope annotations via VDJ_abundances

We will use VDJ_abundaces o obtain the number of cells annotated with each epitope, using the exact and homology matching from above.

p1 <- VDJ_abundances(annotated_VDJ_exact, 
                     feature.columns = 'vdjdb_Epitope', 
                     grouping.column = 'sample_id', 
                     sample.column = 'none', 
                     output.format = 'barplot')

p2 <- VDJ_abundances(annotated_VDJ_homology1, 
                     feature.columns = 'vdjdb_Epitope', 
                     grouping.column = 'sample_id', 
                     sample.column = 'none', 
                     output.format = 'barplot')

p3 <- VDJ_abundances(annotated_VDJ_homology5, 
                     feature.columns = 'vdjdb_Epitope', 
                     grouping.column = 'sample_id', 
                     sample.column = 'none', 
                     output.format = 'barplot')
#Exact matching
p1
## [[1]]

#Homology matching threshold = 1
p2
## [[1]]

#Homology matching threshold = 5
p3
## [[1]]

6. Version information

## R version 4.2.1 (2022-06-23 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
## [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Platypus_3.4.1
## 
## loaded via a namespace (and not attached):
##  [1] stringdist_0.9.8   Rcpp_1.0.9         ape_5.6-2          lattice_0.20-45   
##  [5] tidyr_1.2.0        assertthat_0.2.1   rprojroot_2.0.3    digest_0.6.29     
##  [9] foreach_1.5.2      utf8_1.2.2         R6_2.5.1           evaluate_0.15     
## [13] highr_0.9          ggplot2_3.3.6      pillar_1.8.0       ggfun_0.0.6       
## [17] yulab.utils_0.0.5  rlang_1.0.4        lazyeval_0.2.2     rstudioapi_0.13   
## [21] jquerylib_0.1.4    rmarkdown_2.14     pkgdown_2.0.6      labeling_0.4.2    
## [25] textshaping_0.3.6  desc_1.4.1         stringr_1.4.0      munsell_0.5.0     
## [29] compiler_4.2.1     xfun_0.31          pkgconfig_2.0.3    systemfonts_1.0.4 
## [33] gridGraphics_0.5-1 htmltools_0.5.3    tidyselect_1.1.2   tibble_3.1.8      
## [37] codetools_0.2-18   fansi_1.0.3        dplyr_1.0.9        grid_4.2.1        
## [41] nlme_3.1-157       jsonlite_1.8.0     gtable_0.3.0       lifecycle_1.0.1   
## [45] DBI_1.1.3          magrittr_2.0.3     scales_1.2.0       tidytree_0.3.9    
## [49] cli_3.3.0          stringi_1.7.8      cachem_1.0.6       farver_2.1.1      
## [53] fs_1.5.2           doParallel_1.0.17  ggtree_3.4.1       bslib_0.4.0       
## [57] ellipsis_0.3.2     ragg_1.2.2         generics_0.1.3     vctrs_0.4.1       
## [61] iterators_1.0.14   tools_4.2.1        treeio_1.20.1      ggplotify_0.1.0   
## [65] glue_1.6.2         purrr_0.3.4        parallel_4.2.1     fastmap_1.1.0     
## [69] yaml_2.3.5         colorspace_2.0-3   aplot_0.1.6        memoise_2.0.1     
## [73] knitr_1.39         patchwork_1.1.1    sass_0.4.2